SWAT: Hierarchical Stream Summarization in Large Networks
نویسندگان
چکیده
The problem of statistics and aggregate maintenance over data streams has gained popularity in recent years especially in telecommunications network monitoring, trendrelated analysis, web-click streams, stock tickers, and other time-variant data. The amount of data generated in such applications can become too large to store, or if stored too large to scan multiple times. We consider queries over data streams that are biased towards the more recent values. We develop a technique that summarizes a dynamic stream incrementally at multiple resolutions. This approximation can be used to answer point queries, range queries, and inner product queries. Moreover, the precision of answers can be changed adaptively by a client. Later, we extend the above technique to work in a distributed setting, specifically in a large network where a central site summarizes the stream and clients ask queries. We minimize the message overhead by deciding what and where to replicate by using an adaptive replication scheme. We maintain a hierarchy of approximations that change adaptively based on the query and update rates. We show experimentally that our technique performs better than existing techniques: up to times better in terms of approximation quality, up to four orders of magnitude times better in response time, and up to five times better in terms of message complexity.
منابع مشابه
Impact of Document Structure on Hierarchical Summarization
Hierarchical summarization technique summarizes a large document based on the hierarchical structure and salient features of the document. Previous study has shown that hierarchical summarization is a promising technique which can effectively extract the most important information from the source document. Hierarchical summarization has been extended to summarization of multiple documents. Thre...
متن کاملNews Stream Summarization using Burst Information Networks
This paper studies summarizing key information from news streams. We propose simple yet effective models to solve the problem based on a novel and promising representation of text streams – Burst Information Networks (BINets). A BINet can be aware of redundant information, allows global analysis of a text stream, and can be efficiently built and dynamically updated, which perfectly fits the dem...
متن کاملNeural Summarization by Extracting Sentences and Words
Traditional approaches to extractive summarization rely heavily on humanengineered features. In this work we propose a data-driven approach based on neural networks and continuous sentence features. We develop a general framework for single-document summarization composed of a hierarchical document encoder and an attention-based extractor. This architecture allows us to develop different classe...
متن کاملOn Dense Pattern Mining in Graph Streams
Many massive web and communication network applications create data which can be represented as a massive sequential stream of edges. For example, conversations in a telecommunication network or messages in a social network can be represented as a massive stream of edges. Such streams are typically very large, because of the large amount of underlying activity in such networks. An important app...
متن کاملA Hierarchical Model for Text Autosummarization
Summarization is an important challange in natural language processing. Deep learning methods, however, have not been widely used in text summarization, although neural networks have been proved to be powerful in natural language processing. In this paper, an encoder-decoder neural network model is applied to text summarization, as an important step toward this task. Besides, a hierarchical mod...
متن کامل